docs(keploy-cloud): add AI Agent for Smart Test Sets guide#883
Conversation
53be8ac to
7dd9f01
Compare
Document the ready-made smart-set agent skill: how an AI coding assistant (Claude Code, Cursor, ...) diagnoses a failing smart-set replay and adds new smart tests on a branch via the Keploy MCP tools, plus the mandatory replay flags, the branch boundary, and the full skill file to install. Adds 'rebase' to the Vale vocabulary (a git term the branching docs use). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: charankamarapu <charan@keploy.io>
7dd9f01 to
7c74166
Compare
The lean-MCP change makes tool-search the default for all MCP clients: tools/list shows only the meta-tools, and the full catalog is reached by name via get_tool_schema/search_tools + invoke_tool (tools stay callable — hiding affects discovery, not reachability). Update the two legacy agent docs that still assumed the full catalog is listed: - k8s-proxy-llm-workflow.md: add the 'only meta-tools = tool-search mode, not a missing config' guidance to Hard rule 0 (ported from the local keploy agent skill), so an agent doesn't misread the short list as 'MCP not configured'. - agent-test-generation.md: add a tool-search note by the MCP section and fix the 'discovers tools via tools/list' step in How it Works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: charankamarapu <charan@keploy.io>
The 'Verify the MCP wiring' step still said 'you should see ~100 tools; zero means config failed', which contradicted the new Hard rule 0 (only the meta-tools showing is normal tool-search mode). Reworded it to accept either the full catalog OR just the meta-tools as a healthy state; only zero keploy tools means the config didn't load. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: charankamarapu <charan@keploy.io>
charankamarapu
left a comment
There was a problem hiding this comment.
Reviewed the new smart-set-agent page (links/anchors/ProductTier all check out and the build claim holds). Two internal-consistency issues between the prose and the shipped SKILL.md — left inline.
| 4. **Validate on the branch** with `keploy cloud replay`. | ||
| 5. **Report and stop** — you review the branch diff and merge; merge reconciles `imported-*` to stable `test-N`. | ||
|
|
||
| ## Replay flags the agent always uses |
There was a problem hiding this comment.
Flag table overpromises vs. the shipped SKILL.md. This table presents six flags as required / always used, but the embedded skill — the artifact a reader actually installs and the agent follows — only mandates two: its Discovery section marks --replay-source smart-set and --freezeTime as MANDATORY, and the A4/B4 replay commands show only those. --cluster, --branch-name, --disableReportUpload=false, and --strict-failure appear nowhere in the skill.
Result: an installed agent omits --cluster and hits the exact no active clusters found error this table warns about, or omits --disableReportUpload=false and the run never shows on the dashboard — contradicting "the agent always uses." Fix by adding the four flags to the skill's replay invocations, or soften this table to "recommended."
There was a problem hiding this comment.
Fixed by making the skill match the table (your first option). The embedded skill's Discovery now carries a single canonical replay command that lists all the flags the table promises:
keploy cloud replay --app <ns.deployment> --branch-name <git branch> --cluster <origin.clusterName> --replay-source smart-set --freezeTime --disableReportUpload=false --strict-failure
…and A4/B4 now say "replay with the canonical command from Discovery (all flags)" instead of showing only --replay-source/--freezeTime. So an installed agent uses --cluster (no more no active clusters found) and --disableReportUpload=false (run shows on the dashboard) — the "always uses" table is now true. bcc6b64.
|
|
||
| ### Cursor | ||
|
|
||
| Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor loads project skills automatically; the agent invokes it when your prompt matches a failing smart-set replay or a request to add smart tests. |
There was a problem hiding this comment.
Verify the .cursor/skills/ auto-load claim. Cursor's documented project-context mechanism is .cursor/rules/*.mdc (project rules); a .cursor/skills/SKILL.md directory that auto-loads isn't a known Cursor convention. If it's wrong, Cursor users follow these steps and the skill never auto-invokes. Please confirm against current Cursor docs; if unsupported, point them at .cursor/rules/ (.mdc) instead. The Claude Code path below (.claude/skills/smart-set/SKILL.md) is correct.
There was a problem hiding this comment.
Verified — the path is correct, and I added the doc link so readers can confirm. .cursor/skills/<name>/SKILL.md is Cursor's Agent Skills mechanism (auto-discovered from .cursor/skills/), documented at https://cursor.com/docs/context/skills — distinct from .cursor/rules/*.mdc (always-on project rules), which is the older convention you're thinking of. It also matches the existing k8s-proxy-llm-workflow doc (which uses .cursor/skills/keploy/SKILL.md and makes the same skills-vs-rules distinction), and it's the path the validation runs actually loaded via cursor-agent. Updated the Cursor section to link the docs and call out the skill-vs-rules distinction. bcc6b64.
…uments + verify Cursor path Review feedback on #883: - Flag table vs skill mismatch: the 'Replay flags the agent always uses' table listed 6 flags, but the embedded skill only mandated --replay-source/--freezeTime (so an installed agent would omit --cluster and hit 'no active clusters found'). Gave the skill's Discovery a single canonical replay command listing ALL the flags, and pointed A4/B4 at it — the skill now matches the table. - Cursor install path: verified .cursor/skills/<name>/SKILL.md is correct against Cursor's Agent Skills docs (cursor.com/docs/context/skills, auto-discovered) and the existing k8s-proxy-llm-workflow doc; added the docs link and the skills-vs-.cursor/rules distinction so readers can confirm. (The canonical command is inline, not a nested code fence — a fenced block inside the embedded SKILL.md block was closing it early.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: charankamarapu <charan@keploy.io>
charankamarapu
left a comment
There was a problem hiding this comment.
Reviewed as a principal engineer. The page is well-structured and the SKILL.md is genuinely useful — clear routines, good guardrails (branch-first, never-merge), and the cross-links/anchors all resolve. My comments are about internal consistency: this PR introduces the smart-set page and edits two legacy pages to describe the same tool-search behavior, so the three need to agree, and the page body needs to agree with the embedded SKILL.md it tells people to install. Nothing blocking; the inline notes below are the high-value fixes.
|
|
||
| ## Hard rules | ||
|
|
||
| 0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `search_tools`, `get_tool_schema`, `invoke_tool`), the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `listBranches`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. |
There was a problem hiding this comment.
Meta-tool list is inconsistent with the rest of this PR. Here the meta-tools are listed as four — get_auth_status, search_tools, get_tool_schema, invoke_tool — but the two legacy pages this same PR edits both list five, including get_setup_instructions:
agent-test-generation.md:get_auth_status,get_setup_instructions,search_tools,get_tool_schema,invoke_toolk8s-proxy-llm-workflow.md: same five.
The failure mode is concrete: this rule keys the whole tool-search detection on "if your tool list shows only the meta-tools." If the server actually exposes get_setup_instructions too, an agent comparing against this 4-item set sees an unexpected extra tool and can misclassify the state. Please align this list to the canonical five.
There was a problem hiding this comment.
Aligned to the canonical five. Hard rule 0 now lists get_auth_status, get_setup_instructions, search_tools, get_tool_schema, invoke_tool — matching both legacy pages. I also keyed the detection on "and none of the Smart-set names below" rather than an exact-set match, so if the server exposes an extra always-visible onboarding tool the agent still classifies correctly (keys off the absence of the domain tools, not the exact meta count). af641cb.
| | `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. | | ||
| | `--cluster <name>` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. | | ||
| | `--branch-name <git branch>` | Replay the branch view, including the agent's edits. | | ||
| | `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). | |
There was a problem hiding this comment.
--freezeTime is listed under a table titled "Replay flags the agent always uses" but described as conditional ("Required when the app is built with the Go faketime agent"). The SKILL.md compounds this: Discovery step 4 hardcodes --freezeTime into the canonical command and says "use ALL these flags on every replay", then parenthetically re-qualifies it as faketime-only. Net effect for an agent following the literal instruction: it passes --freezeTime on every replay, including apps not built with faketime. Either (a) state that --freezeTime is a no-op / harmless when the app isn't faketime (so "always" is safe), or (b) move it out of the "always" set and gate it on the faketime condition in both the table and the SKILL. Right now the doc tells the agent two different things.
There was a problem hiding this comment.
Fixed the contradiction by treating --freezeTime as conditional everywhere (your option b). Dropped the table title 'Replay flags the agent always uses' → 'Replay flags', the intro now says 'required on every replay — except --freezeTime, which is added only when the app is built with the Go faketime agent', and the skill's canonical command shows it bracketed ([--freezeTime]) with 'drop --freezeTime for non-faketime apps' + 'ONLY when … (omit it otherwise)'. The agent is no longer told to pass it unconditionally. af641cb.
| ### Routine B — add new smart tests | ||
|
|
||
| 1. **Identify changed endpoints** from the git diff. | ||
| 2. **Capture traffic** with `keploy record --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint. |
There was a problem hiding this comment.
Nit / consistency: the body shows keploy record --sync --disable-mapping=false with no command to run, while SKILL.md B2 correctly shows keploy record -c "<cmd>" --sync --disable-mapping=false. A reader copying this line literally records nothing. Suggest matching the SKILL: include -c "<cmd>".
There was a problem hiding this comment.
Good catch — fixed. The body now shows keploy record -c "<run command>" --sync --disable-mapping=false, matching SKILL.md B2. A literal copy now actually runs the app. af641cb.
|
|
||
| ## Hard rules | ||
|
|
||
| 0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `search_tools`, `get_tool_schema`, `invoke_tool`), the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `listBranches`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. |
There was a problem hiding this comment.
Minor: listBranches is listed in the tool-name set, but no routine uses it — Discovery resolves the branch via create_branch (find-or-create) and caches branch_id. If it's genuinely unused in these routines, drop it to keep the name list tight (every name here is something you're telling the agent to fetch a schema for).
There was a problem hiding this comment.
Dropped listBranches from the name list — you're right, no routine uses it (the branch is resolved via create_branch find-or-create and cached). The list is now exactly the tools the routines fetch. af641cb.
- Meta-tool list: aligned to the same five as the legacy pages (added get_setup_instructions) and keyed tool-search detection on 'none of the domain tools listed' so an extra onboarding tool can't cause misclassification. - --freezeTime: it's conditional (faketime builds only), so dropped the 'always uses' table title and marked it optional in both the table intro and the skill's canonical command ([--freezeTime], 'omit for non-faketime apps') — the doc no longer tells the agent to pass it unconditionally. - Routine B record line: added -c "<run command>" (the body had dropped it, so a literal copy recorded nothing); now matches SKILL.md B2. - Dropped listBranches from the tool-name list — no routine uses it (branch is resolved via create_branch). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: charankamarapu <charan@keploy.io>
charankamarapu
left a comment
There was a problem hiding this comment.
Reviewed as a docs/correctness pass. Overall this is a strong, carefully-written page — the schema_ref/branch-native model is explained well, the two routines map cleanly onto the embedded SKILL.md, and the cross-links (deduplication, time-freezing, the #mcp-server-... anchor) all resolve. The tool-search updates to the two legacy docs are accurate and consistent with the meta-tool set used everywhere else.
A few small consistency/maintainability points below — none are blockers.
1. Branch flag is spelled two different ways on the same page.
Replay uses --branch-name <git branch> (flags table L105, canonical command L155) but the upload command in Routine B3 (L173) uses --branch <git branch>. If that asymmetry is real (the two CLI subcommands genuinely take different flag names), it's worth a one-line note so a reader doesn't read it as a typo and "fix" it. If it's not intentional, one of them is wrong.
2. --app is missing from the "Replay flags" table.
The table intro (L99) says these flags "are required on every replay," but --app <ns.deployment> — which the canonical command (L155) and Routine B all pass — isn't listed. Either add an --app row or soften the intro to "the smart-set-specific flags."
3. Tool-search guidance is now duplicated in three places.
The same explanation (meta-tools list + get_tool_schema/search_tools/invoke_tool reachability) now lives in this page's SKILL.md Hard rule 0, k8s-proxy-llm-workflow.md Hard rule 0, and the new note in agent-test-generation.md. That's fine for now, but if the meta-tool set or the recommended call pattern changes, all three drift independently. Consider pointing the two skill docs at the agent-test-generation.md note as the canonical explanation in a follow-up.
|
|
||
| - **B1 — Identify changes.** `git diff origin/main...HEAD --name-only`, filter to HTTP handlers, list each endpoint's method+path. | ||
| - **B2 — Capture traffic.** Pre-flight the run command, then `keploy record -c "<cmd>" --sync --disable-mapping=false` (both flags mandatory), drive one realistic request per endpoint, stop the recorder by PID. | ||
| - **B3 — Upload onto the branch.** `keploy upload test-set --app <ns.deployment> --branch <git branch> --test-set keploy/test-set-N --smart-test-set --name <name>` (ingests new contracts as `imported-*`, dedup by `schema_ref`). |
There was a problem hiding this comment.
The upload command here uses --branch <git branch>, but every replay command on this page uses --branch-name (flags table L105, canonical command L155). Same page, two spellings of the branch flag. If keploy upload test-set really takes --branch while keploy cloud replay takes --branch-name, a half-sentence noting the difference would prevent a reader from reading it as a typo. If they should match, this is a bug.
There was a problem hiding this comment.
Confirmed real, not a typo — verified against the CLI:
keploy cloud replay --help → --branch-name (scope to a Keploy branch; --branch here is a CI test-data field)
keploy upload test-set --help → --branch (UUID or name, find-or-create; no --branch-name)
So replay scopes with --branch-name and upload with --branch — different subcommands, different flag designs. Added a 'Flag-name asymmetry (not a typo)' note on the --branch-name row so nobody 'fixes' it. 37c97bc.
|
|
||
| ## Replay flags | ||
|
|
||
| When the agent runs `keploy cloud replay` for a smart-set app, these flags are required on **every** replay — except `--freezeTime`, which is added **only** when the app is built with the Go `faketime` agent: |
There was a problem hiding this comment.
This says the listed flags "are required on every replay," but the table omits --app <ns.deployment>, which the canonical command (L155) and Routine B both pass. Suggest either adding an --app row or rewording to "the smart-set-specific flags" so the table doesn't read as the complete required set.
There was a problem hiding this comment.
Added an --app <ns.deployment> row to the table — it's required by every replay (and upload), it was just only shown in the canonical command. The intro's 'required on every replay' now matches the table. 37c97bc.
…oad branch-flag asymmetry Review follow-ups on #883: - Verified against the CLI: keploy cloud replay scopes a branch with --branch-name (its --branch is a CI test-data field), while keploy upload test-set has only --branch (find-or-create). The asymmetry is real, not a typo — added a note on the --branch-name table row so a reader doesn't 'fix' it. - Added the --app <ns.deployment> row to the Replay flags table (it's required by every replay/upload but was only in the canonical command). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: charankamarapu <charan@keploy.io>
|
On the 3rd point (tool-search guidance duplicated across the SKILL.md Hard rule 0, |
charankamarapu
left a comment
There was a problem hiding this comment.
Reviewed as a principal-engineer pass. This is a clean, well-structured docs addition — the cross-page anchor (#mcp-server-recommended-for-ai-agents), sibling links (deduplication, time-freezing), ProductTier props, and the sidebar registration all check out, and the --branch-name vs --branch flag-asymmetry callout is a genuinely nice touch that will save readers a debugging session. The replay-flags table is consistent with the canonical command embedded in SKILL.md. Two consistency nits below — neither is blocking.
Nice work overall. 👍
| - **Case 1 — App regression.** Edit/revert the application source, rebuild the image, replay. Don't touch the test. | ||
| - **Case A — Value drift.** `updateSmartTestCase` — `noiseJson` for non-deterministic fields, `respBody` for a real value change. | ||
| - **Case B — Shape drift.** `updateSmartTestCase` with `requestJson`/`responseJson`; resolve a `SchemaRefConflict` by obsoleting/deleting the twin, never by blind retry. | ||
| - **Case C — Mock drift.** `upsertSmartMock` for an in-place value drift; re-record when the outbound request changed or the match key can't be hand-authored. |
There was a problem hiding this comment.
Inconsistent case labels. The classification list mixes a number with letters: Case 1 — App regression, then Case A, Case B, Case C. That ordering reads as a typo and diverges from the "How it works" section above, which uses clean parallel names (Regression / Value drift / Shape drift / Mock drift).
Suggest renumbering consistently — e.g. Case 1–4 — or, better, reuse the same four labels as the prose section so the SKILL.md and the page narrative line up one-to-one.
|
|
||
| ## Hard rules | ||
|
|
||
| 0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) and none of the Smart-set names below, the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. |
There was a problem hiding this comment.
Tool names here are bare, but the companion doc warns they vary by editor. Hard rule 0 lists exact names (listApps, create_branch, …) and instructs the agent to fetch schemas via get_tool_schema({names:[…]}). The legacy quickstart/k8s-proxy-llm-workflow.md Hard rule 0 explicitly cautions that names differ per client (keploy-<tool> or mcp__keploy*__<tool>).
An agent in Cursor whose tools surface as keploy-listApps would pass the wrong literal to an exact-name get_tool_schema call and get nothing back. Worth adding the same per-editor caveat here, or a line pointing to search_tools(query) as the fallback when an exact name misses — otherwise this skill silently assumes the unprefixed naming.
What
Adds a new docs page — AI Agent for Smart Test Sets (
keploy-cloud/smart-set-agent) — documenting the ready-made agent skill that lets an AI coding assistant (Claude Code, Cursor, and similar) operate Keploy's smart test set end-to-end over MCP:The page covers prerequisites, installing the skill, the key concepts (
schema_refidentity, branch-first enforcement, the branch boundary), the two routines, the mandatorykeploy cloud replayflags, limitations, and the fullSKILL.mdto install.Placement
version-4.0.0-sidebars.json.deduplication,time-freezing,agent-test-generation#mcp-server-...).Validation
docusaurus buildsucceeds — no broken links/anchors, no MDX errors.rebaseto the Base vocabulary — a git term the branching docs use.--check: clean.🤖 Generated with Claude Code
Also in this PR — tool-search note for the legacy agent docs
The lean-MCP change (api-server #1808) makes tool-search the default for all MCP clients:
tools/listshows only the meta-tools, and the full catalog is reached by name viaget_tool_schema/search_tools+invoke_tool(tools stay callable — hiding affects discovery, not reachability). The new smart-set guide already covers this in its skill (Hard rule 0), but two existing legacy agent docs still assumed the full catalog is listed, so they're updated here:quickstart/k8s-proxy-llm-workflow.md— Hard rule 0 now explains that "only the meta-tools" is tool-search mode, not a missing config, so an agent doesn't misread the short list as "MCP not configured" and stop.running-keploy/agent-test-generation.md— added a tool-search note by the MCP section and corrected the "discovers tools viatools/list" step.